Rollout Allocation Strategies for Classification-based Policy Iteration

نویسندگان

  • Victor Gabillon
  • Alessandro Lazaric
چکیده

Classification-based policy iteration algorithms are variations of policy iteration that do not use any kind of value function representation. The main idea is 1) to replace the usual value function learning step with rollout estimates of the value function over a finite number of states, called the rollout set, and the actions in the action space, and 2) to cast the policy improvement step as a classification problem. The choice of rollout allocation strategies over states and actions has significant impact on the performance and computation time of this class of algorithms. In this paper, we present new strategies to allocate the available budget (number of rollouts) at each iteration of the algorithm over states and actions. Our empirical results indicate that for a fixed budget, using the proposed strategies improves the accuracy of the training set over the existing methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification-based Policy Iteration with a Critic

In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a ...

متن کامل

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem, have been proposed recently. Finding good policies with such methods requires not only an appropriate classifier, but also reliable examples of best actions, covering the state space sufficiently. Up to this ti...

متن کامل

Rollout Sampling Policy Iteration for Decentralized POMDPs

We present decentralized rollout sampling policy iteration (DecRSPI)–a new algorithm for multiagent decision problems formalized as DECPOMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout...

متن کامل

Differential Training of 1 Rollout Policies

We consider the approximate solution of stochastic optimal control problems using a neurodynamic programming/reinforcement learning methodology. We focus on the computation of a rollout policy, which is obtained by a single policy iteration starting from some known base policy and using some form of exact or approximate policy improvement. We indicate that, in a stochastic environment, the popu...

متن کامل

Average-Case Performance of Rollout Algorithms for Knapsack Problems

Rollout algorithms have demonstrated excellent performance on a variety of dynamic and discrete optimization problems. Interpreted as an approximate dynamic programming algorithm, a rollout algorithm estimates the value-to-go at each decision stage by simulating future events while following a heuristic policy, referred to as the base policy. While in many cases rollout algorithms are guarantee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010